Varying Slopes Models
General Principles
To model the relationship between predictor variables and a dependent variable while allowing for varying effects across groups or clusters, we use a varying slopes model.
This approach is useful when we expect the relationship between predictors and the dependent variable to differ across groups (e.g., different slopes for different subjects, locations, or time periods). This allows every unit in the data to have its own unique response to any treatment, exposure, or event, while also improving estimates via pooling.
Considerations
We have the same considerations as for 12. Varying intercepts.
The idea is pretty similar to categorical models, where a slope is specified for each category. However, here, we also estimate relationships between different groups. This leads to a different mathematical approach, as to model these relationships between groups, we model a matrix of covariance π.
To construct the covariance matrix, we use an SRS decomposition where S is a diagonal matrix of standard deviations and R is a correlation matrix. To model the correlation matrix, we use an LKJcorr distribution parametrized with a single control parameter Ξ· that controls the amount of regularization. Ξ· is usually set to 2 to define a weakly informative prior that is skeptical of extreme correlations near β1 or 1. When we use LKJcorr(1), the prior is flat over all valid correlation matrices. When the value is greater than 1, then extreme correlations are less likely.
The standard deviations in S are model with a prior that constrains them to strictly positive values.
Example
Below is an example code snippet demonstrating Bayesian regression with varying effects. This example is based on McElreath (2018).
Simulated data
from BI import bi, jnp
# Setup device------------------------------------------------
m = bi(platform='cpu')
# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.sim_multivariate_normal(only_path=True)
m.data(data_path, sep=',')
m.data_on_model = dict(
cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
N_cafes = len(m.df.cafe.unique()),
afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)
# Define model ------------------------------------------------
def model(cafe, wait, N_cafes, afternoon):
a = m.dist.normal(5, 2, name = 'a')
b = m.dist.normal(-1, 0.5, name = 'b')
sigma_cafe = m.dist.exponential(1, shape=(2,), name = 'sigma_cafe')
sigma = m.dist.exponential( 1, name = 'sigma')
Rho = m.dist.lkj(2, 2, name = 'Rho')
cov = jnp.outer(sigma_cafe, sigma_cafe) * Rho
a_cafe_b_cafe = m.dist.multivariate_normal(jnp.stack([a, b]), cov, shape = [N_cafes], name = 'a_b_cafe')
a_cafe, b_cafe = a_cafe_b_cafe[:, 0], a_cafe_b_cafe[:, 1]
mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
m.dist.normal(mu, sigma, obs=wait)
# Run sampler ------------------------------------------------
m.fit(model)
# Summary ------------------------------------------------
m.summary()from BI import bi
# Setup device------------------------------------------------
m = bi(platform='cpu')
# Import Data & Data Manipulation ------------------------------------------------
# Import
from importlib.resources import files
data_path = m.load.sim_multivariate_normal(only_path=True)
m.data(data_path, sep=',')
m.data_on_model = dict(
cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
N_cafes = len(m.df.cafe.unique()),
afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)
# Define model ------------------------------------------------
def model(cafe, wait, N_cafes, afternoon):
sigma = m.dist.exponential( 1, name = 'sigma')
varying_intercept, varying_slope = m.effects.varying_effects(
N_vars = 1,
N_group = N_cafes,
group_id = cafe,
group_name = 'cafe',
centered = False)
mu = varying_intercept + varying_slope* afternoon
m.dist.normal(mu, sigma, obs=wait)
# Run sampler ------------------------------------------------
m.fit(model) library(BayesianInference)
jnp = reticulate::import('jax.numpy')
# Setup platform------------------------------------------------
m=importBI(platform='cpu')
# Import data ------------------------------------------------
m$data(paste(system.file(package = "BayesianInference"),"/data/Sim data multivariatenormal.csv", sep = ''), sep=',')
m$data_to_model(list('cafe', 'wait', 'afternoon'))
# Import data ------------------------------------------------
m$data(paste(system.file(package = "BayesianInference"),"/data/Sim data multivariatenormal.csv", sep = ''), sep=',')
m$data_to_model(list('cafe', 'wait', 'afternoon'))
m$data_on_model
# Define model ------------------------------------------------
model <- function(cafe, afternoon, wait, N_cafes = as.integer(20) ){
a = bi.dist.normal(5, 2, name = 'a')
b = bi.dist.normal(-1, 0.5, name = 'b')
sigma_cafe = bi.dist.exponential(1, shape= c(2), name = 'sigma_cafe')
sigma = bi.dist.exponential( 1, name = 'sigma')
Rho = bi.dist.lkj(as.integer(2), as.integer(2), name = 'Rho')
cov = jnp$outer(sigma_cafe, sigma_cafe) * Rho
a_cafe_b_cafe = bi.dist.multivariate_normal(
jnp$squeeze(jnp$stack(list(a, b))),
cov,
shape = c(N_cafes), name = 'a_cafe')
a_cafe = a_cafe_b_cafe[, 0]
b_cafe = a_cafe_b_cafe[, 1]
mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
bi.dist.normal(mu, sigma, obs=wait)
}
# Run MCMC ------------------------------------------------
m$fit(model) # Optimize model parameters through MCMC sampling
# Summary ------------------------------------------------
m$summary() # Get posterior distributionusing BayesianInference
# Setup device------------------------------------------------
m = importBI(platform="cpu")
# Import Data & Data Manipulation ------------------------------------------------
# Import
data_path = m.load.sim_multivariate_normal(only_path = true)
m.data(data_path, sep=',')
m.data_on_model = pybuiltins.dict(
cafe = jnp.array(m.df.cafe.values, dtype=jnp.int32),
wait = jnp.array(m.df.wait.values, dtype=jnp.float32),
N_cafes = length(m.df.cafe.unique()),
afternoon = jnp.array(m.df.afternoon.values, dtype=jnp.float32)
)
# Define model ------------------------------------------------
@BI function model(cafe, wait, N_cafes, afternoon)
a = m.dist.normal(5, 2, name = "a")
b = m.dist.normal(-1, 0.5, name = "b")
sigma_cafe = m.dist.exponential(1, shape=(2,), name = "sigma_cafe")
sigma = m.dist.exponential( 1, name = "sigma")
Rho = m.dist.lkj(2, 2, name = "Rho")
cov = jnp.outer(sigma_cafe, sigma_cafe) * Rho
a_cafe_b_cafe = m.dist.multivariate_normal(jnp.stack([a, b]), cov, shape = [N_cafes], name = "a_b_cafe")
a_cafe, b_cafe = a_cafe_b_cafe[:, 0], a_cafe_b_cafe[:, 1]
mu = a_cafe[cafe] + b_cafe[cafe] * afternoon
m.dist.normal(mu, sigma, obs=wait)
end
# Run mcmc ------------------------------------------------
m.fit(model) # Optimize model parameters through MCMC sampling
# Summary ------------------------------------------------
m.summary() # Get posterior distributionsMathematical Details
Centered parameterization
The Gaussian Mixture Model is a hierarchical model where each data point is generated from one of K distinct multivariate Gaussian distributions.
The varying intercepts (\alpha_k) and slopes (\beta_k) are modeled using a Multivariate Normal distribution:
\begin{pmatrix} \alpha_k \\ \beta_k \end{pmatrix} \sim \text{MultivariateNormal}\left( \begin{pmatrix} \bar{\alpha} \\ \bar{\beta} \end{pmatrix}, \text{diag}(\varsigma) ~ \Omega ~ \text{diag}(\varsigma) \right)
\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1) \varsigma \sim \text{Exponential}(1) \Omega \sim \text{LKJ}(\eta)
Where:
\left(\begin{array}{cc} \bar{\alpha} \\ \bar{\beta} \end{array}\right) is a vector composed from concatenating a parameter for the global intercept and a parameter vector of the global slopes.
\varsigma is a vector giving the standard deviation of the random effects for the intercept and slopes across groups.
\Omega is the correlation matrix.
Non-centered parameterization
For computational reasons, it is often better to implement a non-centered parameterization π that is equivalent to the Multivariate Normal distribution approach:
\left(\begin{array}{cc} \alpha_k \\ \beta_k\end{array}\right) = \left(\begin{array}{cc} \bar{\alpha} \\ \bar{\beta} \end{array}\right) + \varsigma\circ \left( L \cdot \left( \begin{array}{cc} \widehat{\alpha}_k \\ \widehat{\beta}_k \end{array} \right) \right)
\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1) \varsigma \sim \text{Exponential}(1) L \sim \text{LKJ Cholesky}(\eta)
\widehat{\alpha}_k \sim \text{Exponential}(1)
\widehat{\beta}_k \sim \text{Exponential}(1)
Where:
\sigma_\alpha \sim \text{Exponential}(1) is the prior standard deviation among intercepts.
\sigma_\beta \sim \text{Exponential}(1) is the prior standard deviation among slopes.
L \sim \text{LKJcorr}(\eta) is the a cholesky factor of the correlation matrix matrix using the Cholesky Factor π
Multivariate Model with One Random Slope for Each Variable
We can apply a multivariate model similarly to Chapter 2. In this case, we apply the same principle, but with a covariance matrix with a dimension equal to the number of varying slopes we define. For example, if we want to generate random slopes for i observations in a model with two independent variables X_1 and X_2, we can define the formula as follows:
Y_{i} \sim \text{Normal}(\mu_i , \sigma)
\mu_i = \alpha_i + \beta_{k(i)} X_{1i} + \gamma_{k(i)} X_{2i}
\begin{pmatrix} \alpha\\ \beta\\ \gamma \end{pmatrix} \sim \begin{pmatrix} \bar{\alpha}\\ \bar{\beta}\\ \bar{\gamma} \end{pmatrix} + \varsigma \circ \left( L \cdot \begin{pmatrix} \widehat{\alpha}_{k} \\ \widehat{\beta}_{k} \\ \widehat{\gamma}_{k} \end{pmatrix} \right)
\bar{\alpha} \sim \text{Normal}(0, 1) \bar{\beta} \sim \text{Normal}(0, 1)
\bar{\gamma} \sim \text{Normal}(0, 1)
\varsigma \sim \text{Exponential}(1) L \sim \text{LKJ Cholesky}(2)
\widehat{\alpha}_k \sim \text{Exponential}(1)
\widehat{\beta}_k \sim \text{Exponential}(1)
\widehat{\gamma}_k \sim \text{Exponential}(1)
Notes
We can apply interaction terms similarly to Chapter 3.
We can apply categorical variables similarly to Chapter 4.
We can apply multiple random effects in Bayesian inference as follows:
def model(cafe, wait, N_cafes, afternoon, state):
sigma = m.dist.exponential( 1, name = 'sigma')
# β οΈ Note: `alpha_bar` and `beta_bar` are set to zero to ensure the same intercepts for all varying effects. β οΈ
varying_intercept_1, varying_slope_1 = m.effects.varying_effects(
N_vars = 1,
N_group = N_cafes,
group_id = cafe,
group_name='cafe',
alpha_bar = jnp.array([0]),
beta_bar = jnp.array([0]),
centered=True,
)
varying_intercept_2, varying_slope_2 = m.effects.varying_effects(
N_vars = 1,
N_group = 4,
group_id = states,
group_name='states',
alpha_bar = jnp.array([0]),
beta_bar = jnp.array([0]),
centered=True,
)
varying_intercept = varying_intercept_1 + varying_intercept_2
varying_slope = varying_slope_1 + varying_slope_2
mu = varying_intercept + varying_slope[:,0] * afternoon
m.dist.normal(mu, sigma, obs=waitWhere state is a second varying effect, representing cafes clustered by state.
β οΈ Note: alpha_bar and beta_bar are set to zero to ensure the same intercepts for all varying effects. β οΈ